Discontinuous Parsing with an Efficient and Accurate DOP Model

نویسندگان

  • Andreas van Cranenburgh
  • Rens Bod
چکیده

We present a discontinuous variant of treesubstitution grammar (tsg) based on Linear Context-Free Rewriting Systems. We use this formalism to instantiate a Data-Oriented Parsing model applied to discontinuous treebank parsing, and obtain a significant improvement over earlier results for this task. The model induces a tsg from the treebank by extracting fragments that occur at least twice. We give a direct comparison of a tree-substitution grammar implementation that implicitly represents all fragments from the treebank, versus one that explicitly operates with a significant subset. On the task of discontinuous parsing of German, the latter approach yields a 16 % relative error reduction, requiring only a third of the parsing time and grammar size. Finally, we evaluate the model on several treebanks across three Germanic languages.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Discontinuous Data-Oriented Parsing: A mildly context-sensitive all-fragments grammar

Recent advances in parsing technology have made treebank parsing with discontinuous constituents possible, with parser output of competitive quality (Kallmeyer and Maier, 2010). We apply Data-Oriented Parsing (DOP) to a grammar formalism that allows for discontinuous trees (LCFRS). Decisions during parsing are conditioned on all possible fragments, resulting in improved performance. Despite the...

متن کامل

Discontinuous Data-Oriented Parsing through Mild Context-Sensitivity

It has long been argued that incorporating a notion of discontinuity in phrase-structure is desirable, given phenomena such as topicalization and extraposition, and particular features of languages such as cross-serial dependencies in Dutch and the German Mittelfeld. Up until recently this was mainly a theoretical topic, but advances in parsing technology have made treebank parsing with discont...

متن کامل

Data-Oriented Parsing

1. A DOP model for phrase-structure trees R. Bod and R. Scha 2. Probability models for DOP R. Bonnema 3. Encoding frequency information in stochastic parsing models 1. Computational complexity of disambiguation under DOP K. Sima'an 2. Parsing DOP with Monte Carlo techniques J. Chappelier and M. Rajman 3. Towards efficient Monte Carlo parsing R. Bonnema 4. Efficient parsing of DOP with PCFG-redu...

متن کامل

Efficient Parsing of Domain Language

We present a new framework for specializing both Broad-Coverage Grammars and probabilistic parsers to limited domains of language use. This framework exploits statistically measurable biases in a limited domain to fit the parser coverage to the domain with manageable accuracy loss. An algorithm is presented and applied to the DataOriented Parsing model. We exhibit empirical results concerning t...

متن کامل

Accurate Parsing with Compact Tree-Substitution Grammars: Double-DOP

We present a novel approach to Data-Oriented Parsing (DOP). Like other DOP models, our parser utilizes syntactic fragments of arbitrary size from a treebank to analyze new sentences, but, crucially, it uses only those which are encountered at least twice. This criterion allows us to work with a relatively small but representative set of fragments, which can be employed as the symbolic backbone ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013